Keynote Talks


Tuesday, Sept. 6, 9:30-10:30, Plenary Hall

The evolution of microphone array beamformers

Dr. Jens Meyer and Dr. Gary W. Elko, mh acoustics

Abstract: Today’s acoustic front ends generally have two or more – sometimes many more – microphones. Each microphone represents a sample point of the acoustic field. By sampling the soundfield at multiple locations one can create a spatial filter by combining the microphone signals through signal processing.  The spatial filter, also called a beamformer, enables the discrimination of sound from different directions. This directional characteristic is generally the property that  governs beamformer design.
Beamformers have a long history where the first implementations were purely mechanical. Advances in electronics led to the first analog beamformers. Today’s beamformers use digital implementations. With growing signal processing capabilities, the complexity of beamformers has increased, from simple delay & sum to filter & sum, small superdirectional differential arrays, and more recently, modal beamformers and beyond.
This presentation will review different beamformers with an emphasis on more recent designs and provide a brief outlook into the future.

   

Biographies

Jens Meyer

Jens Meyer received his Dipl.-Ing (M.Sc.) and Dr.-Ing. (PhD) from Darmstadt University of Technology (Germany). In 2002, he cofounded mh acoustics. Jens spent three years developing technologies for hearing diagnostic for Mimosa Acoustics before working full time for mh acoustics. Together with Gary Elko, he developed the Eigenmike® microphone array. First sold in 2003, the Eigenmike array was the first microphone capable of recording Higher Order Ambisonics (HOA). To this day his main technical interest is in acoustic signal processing with emphasis on array beamforming technologies. He is the author of numerous publications and patents.

Gary W. Elko

 

In 1970 I went with my older brother to pick out a hi-fi system with birthday money. One year later I went and did the same thing. But I didn’t have as much money so had to purchase something cheaper. The difference in audio quality between the two loudspeaker systems sparked an intense interest to understand why. The pursuit of that question set me on my path to acoustics and signal processing. I went to Cornell to study Electrical Engineering and then to Penn State for post graduate degrees in Acoustics. Nearing graduation my PhD advisor, Jiri Tichy, introduced me to Jim West who worked at Bell Labs. I got incredibly lucky that a post-Doc working with Jim got homesick, so an opening was available just as I started looking for a job. My job search was over. The first project I worked on at Bell Labs was a large (2x2m) planar microphone array with 400 microphones for a large auditorium. That project was a step towards building Jim Flanagan’s dream of a microphone system that could be fabricated as wallpaper and enable high quality hands-free audio everywhere.  That work led to smaller devices and research into superdirectional differential microphone arrays and the related area of communication acoustics signal processing. My dream job evaporated as the telecom boom ended at the turn of the century. That bust forced us to become entrepreneurs and mh acoustics was started along with my colleagues Jens Meyer and Tomas Gaensler and later joined by Eric Diethorn. Fortunately, the skills and interests we developed working at Bell Labs were still relevant and some of mh’s technology has made it into some highly successful products. The past 20 years have sped by, and the mh team has been fortunate to have the ability to choose what projects we want to tackle while having fun working on challenging front-end acoustic signal processing solutions for ourselves and our customers.

 

Wednesday, Sept. 7, 9:00-10:00, Plenary Hall

Pushing the limits of speech enhancement technology

Dr. Shoko Araki, NTT Communication Science Laboratories

Abstract: Speech enhancement has a long history, dating back to even before the first IWAENC of 1989. In the last decades, we have seen the emergence of speech signal processing techniques such as dereverberation and source separation that can handle speech signals recorded in the real world. This progress has stimulated interest in the field, which remains an area of active and wide research. More recently, deep learning technology has revolutionized many technological fields, and it has also had a significant impact on speech enhancement. This advance has not only dramatically improved performance but has also made it possible to perform tasks that were previously too difficult, such as overall optimization of speech enhancement for a backend objective (e.g., automatic speech recognition) and speech enhancement using modalities other than speech.
For more than two decades, the speaker’s research group has explored source separation, dereverberation, and noise reduction for handling realistic natural conversations with distant microphones. In this talk, I will introduce our recent work on speech enhancement techniques based on multi-channel signal processing as well as new concepts of speech enhancement such as target speech extraction (or selective hearing) brought about by deep learning.

 

Biography: Shoko Araki is a Senior Research Scientist at NTT Communication Science Laboratories, NTT Corporation, Japan where she is currently leading the Signal Processing Research Group. Since joining NTT in 2000, she has been researching acoustic signal processing, microphone array signal processing, blind speech separation, meeting diarization, and auditory scene analysis. She has received several awards for her research, including the Best Paper Award of the IWAENC in 2003, the IEEE Signal Processing Society (SPS) Best Paper Award in 2014, the Young Scientists’ Prize, and the Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology in 2014. She is a Fellow of IEEE. She was formerly a member of the IEEE SPS Audio and Acoustic Signal Processing Technical Committee (AASP-TC) (2014-2019) and currently serves as its vice chair. She serves as vice president of Acoustical Society of Japan (ASJ) (2021-2022).

   

Thursday, Sept. 8, 9:00-10:00, Plenary Hall

Spatial acquisition, digital archiving, and interactive auralization of church acoustics

Prof. Dr. Toon van Waterschoot, KU Leuven

Abstract: Church acoustics form a key element of research in early music studies and cultural heritage preservation. In this talk, we will outline a methodology for the spatial acquisition, digital archiving, and interactive auralization of church acoustics. Along the way, we will touch upon various acoustic signal processing challenges. These include the spatial analysis, interpolation, and compression of acoustic impulse responses on the acquisition and archiving side, and fast convolution, acoustic feedback control and perceptual evaluation on the auralization side. The presented work will be illustrated with a case study on the Nassau Chapel, an early 16th century chapel in Brussels with remarkable acoustics, which has been auralized at the KU Leuven Library of Voices. 

 

Biography: Toon van Waterschoot received MSc (2001) and PhD (2009) degrees in Electrical Engineering, both from KU Leuven, Belgium, where he is currently Associate Professor, ERC Consolidator Grantee, and Head of the Stadius Center for Dynamical Systems, Signal Processing and Data Analytics. He has previously also held teaching and research positions at Delft University of Technology in The Netherlands and the University of Lugano in Switzerland. His research interests are in signal processing, machine learning, and numerical optimization, applied to acoustic signal enhancement, acoustic modeling, audio analysis, and audio reproduction. He has been the Scientific Coordinator for several major European research projects: the FP7-PEOPLE Marie Curie Initial Training Network “Dereverberation and Reverberation of Audio, Music, and Speech (DREAMS, 2013-2016)”, the H2020 ERC Consolidator Grant “The Spatial Dynamics of Room Acoustics (SONORA, 2018-2023)”, and the H2020 MSCA European Training Network “Service-Oriented Ubiquitous Network-Driven Sound (SOUNDS, 2021-2024)”.